Stop Word and Related Problems in Web Interface Integration
نویسندگان
چکیده
The goal of recent research projects on integrating Web databases has been to enable uniform access to the large amount of data behind query interfaces. Among the tasks addressed are: source discovery, query interface extraction, schema matching, etc. There are also a number of tasks that are commonly ignored or assumed to be apriori solved either manually or by some oracle. These tasks include (1) finding the set of stop words and (2) handling occurrences of “semantic enrichment words” within labels. These two subproblems have a direct impact on determining the synonymy and hyponymy relationships between labels. In (1), a word like “from” is a stop word in general but it is a content word in domains such as Airline and Real Estate. We formulate the stop word problem, prove its complexity and provide an approximation algorithm. In (2), we study the impact of words like AND and OR on establishing semantic relationships between labels (e.g. “departure date and time” is a hypernym of “departure date”). In addition, we develop a theoretical framework to differentiate synonymy relationship from hyponymy relationship among labels involving multiple words. We scrutinize its strength and limitations both analytically and experimentally. We use real data from the Web in our experiments. We analyze over 2300 labels of 220 user interfaces in 9 distinct domains.
منابع مشابه
Multilingual Word Sense Disambiguation and Entity Linking for Everybody
In this paper we present a Web interface and a RESTful API for our state-of-the-art multilingual word sense disambiguation and entity linking system. The Web interface has been developed, on the one hand, to be user-friendly for non-specialized users, who can thus easily obtain a first grasp on complex linguistic problems such as the ambiguity of words and entity mentions and, on the other hand...
متن کاملارزیابی کاربردپذیری سامانه مدیریت کتابخانههای عمومی کشور (سامان) بر اساس اصول دهگانه نیلسون
Purpose: evaluation of the user interface of the management system of Iran Public Libraries Foundation (Saman) is the main aim of the paper. Saman is a newly developed web based and integrated library software that seemingly works as a library OPAC. Methodology: This research is an applied study and tries to investigate the usability standards of Saman website through heuristic evaluation met...
متن کاملIntellectual Structure of Knowledge in Information Behavior: A Co-Word Analysis
Background and Aim: The intellectual structure of knowledge and its research front can be identified by co-word analysis. This research attempts to reveal the intellectual structure of knowledge in information behavior inquiries, via co-word, network analysis, and science visualization tools. Methods: Bibliometric methodology and social network analysis are used. Population comprises 2146 recor...
متن کاملInvestigating the Level of Observing the Evaluation Criteria for User Interface in library services providing to the blind and deaf users in the word
Purpose: Digital library user interfaces has a determining role in desirable performance of this kind of libraries. Digital Library service providers to the blind and deaf users will have their best performance when the users (deaf and blind users) could have a proper interaction with them. This study aims to evaluate and analyze the criteria related to user interface in digital libraries servi...
متن کاملLinkedGeoData – Collaboratively Created Geo-Information for the Semantic Web
In order to employ the Web as a medium for data and information integration, comprehensive datasets and vocabularies are required as they enable the disambiguation and alignment of other data and information. Many real-life information integration and aggregation tasks are impossible without comprehensive background knowledge related to spatial features of the ways, structures and landscapes su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009